Classification and clustering for samples of event time data using non-homogeneous Poisson process models

نویسندگان

  • Duncan Barrack
  • James Goulding
  • Simon Preston
چکیده

Data of the form of event times arise in various applications. A simple model for such data is a non-homogeneous Poisson process (NHPP) which is specified by a rate function that depends on time. We consider the problem of having access to multiple independent samples of event time data, observed on a common interval, from which we wish to classify or cluster the samples according to their rate functions. Each rate function is unknown but assumed to belong to a small set of rate functions defining distinct classes. We model the rate functions using a spline basis expansion, the coefficients of which need to be estimated from data. The classification approach consists of using training data for which the class membership is known and to calculate maximum likelihood estimates of the coefficients for each group, then assigning test samples to a class by a maximum likelihood criterion. For clustering, by analogy to the Gaussian mixture model approach for Euclidean data, we consider a mixture of NHPP models and use the expectation maximisation algorithm to estimate the coefficients of the rate functions for the component models and probability of membership for each sample to each model. The classification and clustering approaches perform well on both synthetic and real-world data sets considered. Code associated with this paper is available at https://github.com/duncan-barrack/NHPP.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Oil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)

Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...

متن کامل

Experimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering

One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...

متن کامل

The optimal age-based replacement policy for systems subject to shocks

In this article, two different systems subject to shocks occurring based on a non-homogeneous Poisson process (NHPP) are analyzed. Type –I system is consisted of a single unit and type –II system is consisted of two parallel units in which both units operate identically and simultaneously. In type –I system occurrence of a shock causes system stopping and consequently will be received minimal r...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

 Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.02111  شماره 

صفحات  -

تاریخ انتشار 2017